-
Notifications
You must be signed in to change notification settings - Fork 227
chore: Replace isinstance(obj, T) with type(obj) is T comparisons #1292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
/ok to test |
|
/ok to test 0db38d0 |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great work. I downloaded your script and was able to reproduce a similar result (5.23x faster) on my laptop. The math looks sound. Just to make sure, I replaced your manual calculations with Python's builtin timeit.timeit and pyperf (the latter being the sort of "gold standard" for accurate perf timings in Python). But the result is all roughly the same, and 5x is large enough that that level of accuracy doesn't really matter -- it's an obvious big win.
My only concern with this PR is backward compatibility. It is technically possible to subclass either a numpy or ctypes datatype right now and it would be accepted and work here with the isinstance check but would no longer be accepted after this change. I don't know how often that actually happens in practice, and our test suite obviously doesn't do that. I'm not sure how to assess how much we care about this -- it seems hard to do a GitHub code search for, for example. @leofang, thoughts?
If we determine we do want to be strict about backward compatibility, we could probably do:
if arg_type is ctypes_bool:
...
elif ...
...
else:
# If no exact types are found, fallback to slower `isinstance` check
if isinstance(arg_type, ctypes_bool):
...
elif:
...
else:
return 1
Note that the fallback cases are in a separate if/elif/else block so that Cython can still optimize the outer one to a C switch statement.
I suspect that would not have a significant impact on the benchmark (which doesn't exercise subclasses). If we go this route, we should also add a test that creates a subclass of a ctype and numpy type and confirms that it works and does the right thing.
Outside of backward compatibility, I think it's reasonable to continue supporting subclasses but to fast path the most common types that we expect users to use, so I'm in support of taking this approach. |
To clarify my understanding, would it be enough to do: if arg_type is ctypes_bool:
...
elif isinstance(arg_type, ctypes_bool):
...
else:
return 1or should the check be done manually for all possible subclasses of type if arg_type is T:
...
elif arg_type is subclass_1:
...
elif arg_type is subclass_2:
...
elif isinstance(arg_type, T):
...
else:
return 1 |
|
/ok to test |
The subclasses would likely be user created / controlled, so we can't possibly cover them all, so we should go with the first option. If we find there's some kind of common subclass then it's something we could explore fast pathing in the future. |
|
I am not too worried about backward compatibility in this case, because it is rare in practice that someone would e.g. subclass If we want to keep bc, Mike's suggestion on shuffling all existing |
So basically retaining this structure, but adding all the if arg_type is T1:
...
elif arg_type is T2:
...
elif arg_type is T3:
...
else:
if isinstance(arg_type, T1):
...
elif isinstance(arg_type, T2):
...
elif isinstance(arg_type, T3):
...
else:
return 1 |
|
Yes, this was what Mike suggested 🙂 |
Signed-off-by: Bharat Raghunathan <[email protected]>
|
/ok to test |
|
/ok to test 5e09e0c |
mdboom
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. Thanks again. Let's merge this once the tests have passed.
mdboom
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there are 2 test failures here that will need to be addressed. I'm not exactly sure why they are failing, but let me know if I can help once you've had a chance to look.
Signed-off-by: Bharat Raghunathan <[email protected]>
leofang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think commit ed81c11 is correct. The bool treatment should not be changed. @bharatr21 have you been able to run the test suite locally and confirm everything passes before pushing?
It looks like this kernel logic is treating the Python primitive |
|
Let us know if you have trouble setting up an environment! I tried bisecting locally and it seems the test failures in |
May I get some guidance on how to setup the environment on a system without GPUs? I'm also slightly surprised/confused, as all CI checks here seem to have passed since |
|
The old set of checks was like this: Since This PR moves this to: The ordering has changed, but the ordering also should no longer matter because it's now doing exact type checks. When Cython converts a Python bool to C, it is defined as being an So, I think @bharatr21's change here is correct. I played a bit with specifying it this way: but this doesn't work. but that also doesn't work. I can't find a way to specify "C int" from Cython. However, The only small tweak to make is that it seems that the explicit Python cast from a |
mdboom
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (see my longer explanation). Just need to remove the unnecessary casts.
Yeah, I think unfortunately we can't really test this without a GPU. There may be other suggestions, but you can get 30 days/$200 free credits on Azure to at least get started. Not a long term solution, though. |
Ah! I was playing around with this last night, but for some reason I missed checking this combination. Good catch, Mike!
That's right. The |
Explicit cast not needed since `prepare_arg` does it automatically Co-authored-by: Leo Fang <[email protected]>
|
/ok to test 0fc1ed8 |
|
I've added commit 0fc1ed8 here to fix the failing test (so it expects a bool) and added a changelog comment (since that's technically a breaking change). It remains a mystery to me why this test, testing passing bool values to a kernel, didn't break. cuda-python/cuda_core/tests/test_launcher.py Line 157 in c860f3f
set_handle function vs. an actual kernel.
|
I figured out why this is the case. On
On this
|
| # TODO: revisit this treatment if we decide to cythonize cuda.core | ||
| if isinstance(arg, driver.CUgraphConditionalHandle): | ||
| if arg_type is driver.CUgraphConditionalHandle: | ||
| prepare_arg[intptr_t](self.data, self.data_addresses, <intptr_t>int(arg), i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like in the header it's a typedef of unsigned long long, let's be explicit:
| prepare_arg[intptr_t](self.data, self.data_addresses, <intptr_t>int(arg), i) | |
| prepare_arg[unsigned long long](self.data, self.data_addresses, arg, i) |
alternatively, I think this should work too
| prepare_arg[intptr_t](self.data, self.data_addresses, <intptr_t>int(arg), i) | |
| prepare_arg[cydriver.CUgraphConditionalHandle](self.data, self.data_addresses, arg, i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first option doesn't work because the Cython parser can't handle unsigned long long in that place. It would have to be some sort of typedef to unsigned long long. But I like the second option a lot better anyway -- seems more explicit about what it needs to match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first option doesn't work because the Cython parser can't handle
unsigned long longin that place.
Ah yes. Need to add it to the list under the supported_type fused type. But I agree the 2nd option is better.
|
Thanks, @mdboom @bharatr21. Looks like this PR is now both an enhancement (better perf) and bug fix (exposing two type casting bugs) 😄 |
My last commit improves the tests to test creating conditional handles in both ways: I think to strictly follow what upstream cudaGraphConditionalHandle does, ...and this is now being tested. |
|
/ok to test 48a0856 |
| return prepare_arg[cpp_double_complex](data, data_addresses, arg, idx) | ||
| else: | ||
| return 1 | ||
| # If no exact types are found, fallback to slower `isinstance` check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: this block might add penalty to ctypes and graph conditional handle dispatch, since they are tried after NumPy. I dunno how much this costs, though.
| - Python ``bool`` objects are now converted to C++ ``bool`` type when passed as kernel | ||
| arguments. Previously, they were converted to ``int``. This brings them inline | ||
| with ``ctypes.c_bool`` and ``numpy.bool_``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: I think this no longer applies? Should we just add another entry to mention the perf improvement
|
/ok to test d9a7c14 |
Description
closes #1282
Replace
isinstance(obj, T)checks withtype(obj) is Tto optimizecuda.core.launch()Additional Notes
I made a benchmarking script in Cython to prove the speedup of using
type()in place ofisinstance()checks since the original issue requested profiling which resulted in an ~5x speedup.Appreciate some guidance to know if I've done the profiling right
Created a file
benchmark_isinstance_cython.pyx:I mainly used the compiler flags
-O3and-march=nativeand compiled and ran the above benchmark via this setup scriptsetup_benchmark.py:The script was then run with
python setup_benchmark.py build_ext --inplaceUPDATE: Updated the benchmark script to include the
isinstance()fallback and it still results in an ~3x speedupChecklist